perf: faster model resolution, JSON decoding, and request snapshotting by SantiagoDePolonia · Pull Request #413 · ENTERPILOT/GoModel

SantiagoDePolonia · 2026-06-18T15:29:10Z

Summary

Reduces per-request CPU and allocations on the gateway hot path. All changes are behavior-preserving for valid input; the one intentional difference (goccy input leniency) is documented and pinned by a test. Verified with the full make test-race suite, make lint, and the hot-path perf guard (all green via pre-commit hooks).

Changes

1. O(1) model resolution

Per request, the router resolved the model selector ~6× and each qualified resolution copied the entire model catalog and linear-scanned it (ListModelsWithProvider).

Added a lazy provider-selector index to the registry (qualifiedByName / qualifiedByType), built once and cleared at the existing single invalidation point.
Routed resolution through it via an optional qualifiedSelectorResolver interface; the catalog scan remains as a fallback for non-indexed lookups and raw slash-shaped model IDs.
Deduplicated the now-redundant name/type scans in resolveQualifiedSelector.

Measured: resolution is now O(1) and constant in catalog size.

Catalog	Before	After
300 models	31,454 ns / 164 KB / req	800 ns / 0.3 KB / req
1000 models	95,930 ns / 459 KB / req	814 ns / 0.3 KB / req

2. JSON decoding → `goccy/go-json`

Migrated internal/ + cmd/ from encoding/json to github.com/goccy/go-json (true drop-in; package is named json). gjson unchanged. Test files intentionally stay on encoding/json as a stdlib oracle.
~3.8× faster realistic chat-body decode (39,000 → 10,300 ns) with fewer allocations.
Dropped the redundant gjson.ValidBytes walk in extractUnknownJSONFields (callers already validate via the preceding Unmarshal).

Behavior note: goccy is slightly more lenient than stdlib on a couple of malformed inputs (leading-zero numbers; malformed values inside skipped passthrough fields). Accepted under the gateway's "accept generously" (Postel's Law) principle and pinned by TestDecoderLeniencyIsBounded. All valid input decodes identically.

3. Request-snapshot allocations

Added NewRequestSnapshotWithOwnedMaps so ingress capture owns the freshly-built route/query/trace maps and body, cloning only the live header map.
Added zero-copy HeadersView and pointed read-only callers at it.
Removed the now-superseded NewRequestSnapshotWithOwnedBody constructor.

4. Perf harness

The gateway hot-path benchmark previously passed a bare provider to server.New, bypassing the Router/registry entirely — so the perf guard protected a path that doesn't exist in production. It now wires the real Router + a populated catalog, with a guard case. Added a resolution micro-benchmark (resolve_bench_test.go).

Impact framing

JSON decode and model resolution are a slice of per-request work (the upstream LLM call dominates wall-clock), so this is primarily a throughput / CPU / GC win across every endpoint, not a dramatic per-request latency drop.

Risks / follow-ups

New dependency: github.com/goccy/go-json (MIT, pure Go, v0.10.6) is now on the core hot path — worth a conscious sign-off.
Benchmarks ran on darwin/arm64. goccy is pure Go (consistent across platforms), but recommend confirming the win on linux/amd64 (prod arch) in CI before merge.

Test plan

make test-race (full suite, 58 packages) — green
make lint — green
hot-path perf guard — green
linux/amd64 benchmark confirmation (CI)

🤖 Generated with Claude Code

Summary by CodeRabbit

Performance Improvements
- Improved JSON processing throughput across the service by switching to a faster JSON implementation.
- Reduced per-request model/provider selection overhead via cached, routed selector resolution.
- Expanded hot-path benchmarking to measure both bare and routed request flows.
Bug Fixes
- Tightened JSON parsing/leniency around previously tolerated malformed patterns while still rejecting invalid syntax.
Tests
- Added/updated benchmarks for selector resolution and routed hot-path handling, including updated snapshot/header-related test coverage.

Reduce per-request CPU and allocations on the gateway hot path. Changes are behavior-preserving for all valid input; the one intentional difference is documented and tested. Model resolution (O(1)): - Add a lazy provider-selector index to the registry (qualifiedByName / qualifiedByType), invalidated at the existing single cache-invalidation point. - Route qualified-selector resolution through it via an optional qualifiedSelectorResolver interface, with the catalog scan kept as a fallback for non-indexed lookups and raw slash-shaped model IDs. - Resolution is now O(1) and constant in catalog size (was O(N), copying the full catalog several times per request): ~31us/164KB -> ~0.8us/0.3KB at 300 models. Deduplicated the redundant name/type scans in resolveQualifiedSelector. JSON decoding (goccy/go-json): - Migrate internal/ + cmd/ from encoding/json to github.com/goccy/go-json (drop-in; package is named json). gjson is unchanged. - ~3.8x faster realistic chat-body decode with fewer allocations. - goccy is slightly more lenient than encoding/json on a couple of malformed inputs (leading-zero numbers; malformed values in skipped passthrough fields). Accepted under the gateway's accept-generously principle and pinned by TestDecoderLeniencyIsBounded. - Drop the redundant gjson.ValidBytes walk in extractUnknownJSONFields (callers already validate via the preceding Unmarshal). Request snapshot allocations: - Add NewRequestSnapshotWithOwnedMaps so ingress capture owns the freshly built route/query/trace maps and body, cloning only the live header map. - Add HeadersView (zero-copy) and route read-only callers to it. - Remove the now-superseded NewRequestSnapshotWithOwnedBody constructor. Perf harness: - Make the gateway hot-path benchmark exercise the real Router + populated catalog (it previously bypassed routing, giving false confidence) and add a guard case for it. Add a resolution micro-benchmark. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-06-18T15:29:15Z

Too many files changed for review. (126 files found, 100 file limit)

coderabbitai · 2026-06-18T15:29:34Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6f4b8cd6-af65-4a5d-ab0a-55c3e3366e5a

📥 Commits

Reviewing files that changed from the base of the PR and between 4f26d43 and 39d477c.

📒 Files selected for processing (3)

internal/core/request_snapshot_test.go
internal/providers/registry.go
internal/providers/resolve_bench_test.go

📝 Walkthrough

Walkthrough

The PR replaces encoding/json with github.com/goccy/go-json across the entire codebase (~130+ files), renames NewRequestSnapshotWithOwnedBody to NewRequestSnapshotWithOwnedMaps and adds a zero-copy HeadersView() accessor, removes an internal gjson.ValidBytes pre-check from extractUnknownJSONFields, adds an O(1) qualified-selector index (qualifiedByName/qualifiedByType) to ModelRegistry with a qualifiedSelectorResolver fast path in Router, and extends hot-path benchmarks to cover the routed resolution path.

Changes

go-json migration, RequestSnapshot refactor, O(1) model resolution, and benchmarks

Layer / File(s)	Summary
Dependency addition and RequestSnapshot API refactor `go.mod`, `internal/core/request_snapshot.go`, `internal/core/request_snapshot_test.go`, `internal/server/request_snapshot.go`, `internal/auditlog/entry_capture.go`, `internal/responsecache/responsecache.go`, `internal/providers/xai/xai.go`	Adds `github.com/goccy/go-json v0.10.6`; replaces `NewRequestSnapshotWithOwnedBody` with `NewRequestSnapshotWithOwnedMaps` (owns all maps, only clones `headers`); adds `HeadersView()` zero-copy accessor; migrates `internalJSONAuditHeaders`, `internalRequestHeaders`, and `xGrokConversationIDFromSnapshot` from `GetHeaders()` to `HeadersView()`.
json_fields behavioral change and leniency tests `internal/core/json_fields.go`, `internal/core/json_fields_test.go`	Switches to `go-json`, removes internal `gjson.ValidBytes` pre-validation from `extractUnknownJSONFields` (relies on caller-guaranteed valid JSON), replaces the invalid-syntax test to target `ChatRequest.UnmarshalJSON`, and adds `TestDecoderLeniencyIsBounded` pinning two accepted `go-json` leniencies in passthrough fields.
ModelRegistry O(1) selector index `internal/providers/registry.go`, `internal/providers/registry_metadata.go`	Adds `qualifiedByName` and `qualifiedByType` cached maps; implements `ResolveProviderSelector` with lazy index building under write lock and O(1) read-lock lookup; introduces `buildSelectorIndexLocked` with deterministic collision handling and `lookupSelectorIndex` helper; clears index maps on cache invalidation; swaps JSON imports to `go-json`.
Router qualifiedSelectorResolver fast path `internal/providers/router.go`	Adds unexported `qualifiedSelectorResolver` interface; updates `resolveQualifiedSelector` to attempt fast-path lookup first and fall back to `resolveProviderOwnedRawSelector` scan when unavailable; swaps JSON imports to `go-json`.
Resolution and hot-path performance benchmarks `internal/providers/resolve_bench_test.go`, `tests/perf/hotpath_test.go`, `tests/perf/README.md`	Adds `BenchmarkResolvePerRequest` and `BenchmarkListModelsWithProvider` with `buildBenchRegistry` helper; extends `hotpath_test.go` with `benchProvider.models` field, `newRoutedBenchServer` factory, `BenchmarkGatewayHotPathChatCompletionRouted`, and a new routed-path ceiling in `TestHotPathPerfGuard`; documents bare vs. routed benchmark differences in README.
Global encoding/json → goccy/go-json import swap `cmd/...`, `internal/admin/...`, `internal/aliases/...`, `internal/anthropicapi/...`, `internal/app/...`, `internal/auditlog/...`, `internal/batch/...`, `internal/cache/...`, `internal/conversationstore/...`, `internal/core/...`, `internal/embedding/...`, `internal/gateway/...`, `internal/guardrails/...`, `internal/live/...`, `internal/llmclient/...`, `internal/modeldata/...`, `internal/modeloverrides/...`, `internal/pricingoverrides/...`, `internal/providers/...`, `internal/responsecache/...`, `internal/responsestore/...`, `internal/server/...`, `internal/streaming/...`, `internal/usage/...`, `internal/workflows/...`	Replaces `encoding/json` with `github.com/goccy/go-json` (imported as `json`) across all remaining files; all existing `json.Marshal`, `json.Unmarshal`, `json.RawMessage`, `json.NewDecoder`, `json.Valid`, and `json.Number` usages now resolve to the new library.

Sequence Diagram(s)

Model resolution now supports an optional O(1) fast path when ModelRegistry provides a cached selector index, falling back to catalog scan when the resolver is unavailable:

sequenceDiagram
  participant req as HTTP Request
  participant router as Router
  participant resolver as ModelRegistry<br/>qualifiedSelectorResolver
  participant fallback as resolveProviderOwnedRawSelector
  participant found as core.ModelSelector

  req->>router: resolveQualifiedSelector(segment, modelID)
  router->>resolver: ResolveProviderSelector(segment, modelID)
  alt fast path available
    resolver->>resolver: RLock → lookupSelectorIndex
    alt index cache hit
      resolver-->>router: ModelSelector, ok=true
    else index miss
      resolver->>resolver: WLock → buildSelectorIndexLocked
      resolver->>resolver: populate qualifiedByName, qualifiedByType
      resolver-->>router: ModelSelector, ok=true/false
    end
  else no resolver (fallback)
    router->>fallback: scan catalog for match
    fallback-->>router: ModelSelector, ok=true/false
  end
  router-->>found: matched provider selector

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

ENTERPILOT/GoModel#158: The main PR's go-json import swap in internal/streaming/observed_sse_stream.go directly affects the JSON unmarshalling path used by the shared ObservedSSEStream/observer SSE parsing introduced in that PR.
ENTERPILOT/GoModel#293: Main PR's JSON implementation swap in internal/usage/cost.go (including json.Number/numeric parsing paths) directly impacts the OpenRouter CalculateUsageCost logic introduced in that PR.
ENTERPILOT/GoModel#389: Main PR's JSON implementation swap in internal/core/chat_content.go directly overlaps with #389's new/updated input_audio validation and marshaling logic in the same file.

Poem

🐇 Hop hop, import swap complete,
No more stdlib JSON, what a feat!
go-json now zips through every byte,
O(1) routing gleaming bright.
My carrot cache resolves in a blink —
Faster than you'd dare to think! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately and concisely summarizes the main optimizations: faster model resolution (O(1)), JSON decoding (goccy library migration), and request snapshotting (new owned-maps constructor).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/optimization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-06-18T15:37:22Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 86.84211% with 10 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
internal/providers/registry.go	87.75%	3 Missing and 3 partials ⚠️
internal/core/request_snapshot.go	88.23%	1 Missing and 1 partial ⚠️
internal/providers/router.go	66.66%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/core/request_snapshot_test.go (1)

74-106: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Expand owned-maps test coverage to include map/header semantics.

The renamed test currently pins only captured-body ownership. Please also assert that owned route/query/trace maps are not cloned, while headers are still cloned, to lock in the full constructor contract.

💡 Suggested test extension

 func TestNewRequestSnapshotWithOwnedMaps_TakesOwnershipOfCapturedBytes(t *testing.T) {
+	routeParams := map[string]string{"provider": "openai"}
+	queryParams := map[string][]string{"limit": {"5"}}
+	headers := map[string][]string{"X-Test": {"a"}}
+	traceMetadata := map[string]string{"Traceparent": "trace-1"}
 	rawBody := []byte(`{"model":"gpt-5-mini"}`)
 
 	snapshot := NewRequestSnapshotWithOwnedMaps(
 		"POST",
 		"/v1/chat/completions",
-		nil,
-		nil,
-		nil,
+		routeParams,
+		queryParams,
+		headers,
 		"application/json",
 		rawBody,
 		false,
 		"req-123",
-		nil,
+		traceMetadata,
 		"/team/a",
 	)
@@
 	if &clonedBody[0] == &rawBody[0] {
 		t.Fatal("CapturedBody returned owned bytes directly, want defensive copy")
 	}
+
+	routeParams["provider"] = "anthropic"
+	if got := snapshot.GetRouteParams()["provider"]; got != "anthropic" {
+		t.Fatalf("GetRouteParams provider = %q, want anthro pic (owned map)", got)
+	}
+	queryParams["limit"][0] = "99"
+	if got := snapshot.GetQueryParams()["limit"][0]; got != "99" {
+		t.Fatalf("GetQueryParams limit = %q, want 99 (owned map)", got)
+	}
+	traceMetadata["Traceparent"] = "trace-2"
+	if got := snapshot.GetTraceMetadata()["Traceparent"]; got != "trace-2" {
+		t.Fatalf("GetTraceMetadata Traceparent = %q, want trace-2 (owned map)", got)
+	}
+	headers["X-Test"][0] = "mutated"
+	if got := snapshot.GetHeaders()["X-Test"][0]; got != "a" {
+		t.Fatalf("GetHeaders X-Test = %q, want a (cloned headers)", got)
+	}
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/core/request_snapshot_test.go` around lines 74 - 106, The
TestNewRequestSnapshotWithOwnedMaps_TakesOwnershipOfCapturedBytes test currently
only validates captured body ownership but does not test the complete contract
of the NewRequestSnapshotWithOwnedMaps constructor regarding map and header
semantics. Add assertions to verify that route, query, and trace maps passed to
NewRequestSnapshotWithOwnedMaps are not cloned by the snapshot (confirming
ownership is taken), while headers should still be defensively cloned to prevent
external mutations. Create sample maps for these fields, pass them through the
constructor, retrieve them via appropriate accessor methods, and verify the
pointer equality or inequality as needed to confirm the ownership vs cloning
behavior for each map type.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/providers/registry.go`:
- Around line 174-183: The code builds selector-index keys using potentially
untrimmed values for publicName and info.ProviderType, while the lookup paths
(lines 126-127) use trimmed inputs. Apply strings.TrimSpace() to normalize
publicName when it is assigned from info.ProviderName, and also apply
strings.TrimSpace() to info.ProviderType before using it in the typeKey
construction. This ensures that keys built during registration match keys used
during lookups even when config/provider metadata contains whitespace padding.

In `@internal/providers/resolve_bench_test.go`:
- Around line 33-35: The benchmark test case labels are mislabeled due to
integer truncation in the buildBenchRegistry call. In the benchmark loop that
iterates over slice values (50, 300, 1000), the calculation n/6 for the
per-provider count causes truncation, resulting in actual model counts that
differ from the label (models=50 actually benchmarks 48 models, models=1000
actually benchmarks 996 models). Fix this by either changing the loop values to
numbers divisible by 6 such that when multiplied back (divideCount * 6) they
equal the intended model count, or recalculate the label to show the actual
number of models being created by computing divideCount * 6 and using that value
in the b.Run label instead of n. Apply the same fix to the second benchmark loop
also mentioned in the comment.

In `@tests/perf/README.md`:
- Around line 26-29: The README.md file contains outdated performance
documentation in the section describing the routed path (around lines 26-29).
The current text still references repeated full-catalog copies per request, but
the actual implementation now uses O(1) selector-index behavior where resolution
is computed once per request and reused. Update the routed path performance
explanation to accurately reflect this current behavior by removing or revising
the outdated statement about order of magnitude allocations from repeated
catalog copies, and instead describe how resolution is now computed once and
reused, which provides the O(1) performance characteristics referenced in the
perf-guard commentary.

---

Outside diff comments:
In `@internal/core/request_snapshot_test.go`:
- Around line 74-106: The
TestNewRequestSnapshotWithOwnedMaps_TakesOwnershipOfCapturedBytes test currently
only validates captured body ownership but does not test the complete contract
of the NewRequestSnapshotWithOwnedMaps constructor regarding map and header
semantics. Add assertions to verify that route, query, and trace maps passed to
NewRequestSnapshotWithOwnedMaps are not cloned by the snapshot (confirming
ownership is taken), while headers should still be defensively cloned to prevent
external mutations. Create sample maps for these fields, pass them through the
constructor, retrieve them via appropriate accessor methods, and verify the
pointer equality or inequality as needed to confirm the ownership vs cloning
behavior for each map type.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cbf95ee6-7e26-4f2d-b8a7-0d7f47aad549

📥 Commits

Reviewing files that changed from the base of the PR and between 9025183 and 1283ab9.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (125)

cmd/gomodel/health.go
cmd/recordapi/main.go
go.mod
internal/admin/handler_guardrails.go
internal/admin/handler_live.go
internal/aliases/batch_preparer.go
internal/anthropicapi/request.go
internal/anthropicapi/response.go
internal/anthropicapi/stream.go
internal/anthropicapi/types.go
internal/app/app.go
internal/auditlog/auditlog.go
internal/auditlog/entry_capture.go
internal/auditlog/middleware.go
internal/auditlog/reader_postgresql.go
internal/auditlog/reader_sqlite.go
internal/batch/store.go
internal/cache/modelcache/local.go
internal/cache/modelcache/modelcache.go
internal/cache/modelcache/redis.go
internal/conversationstore/store.go
internal/conversationstore/store_memory.go
internal/core/audio.go
internal/core/batch.go
internal/core/batch_json.go
internal/core/batch_preparation.go
internal/core/chat_content.go
internal/core/chat_json.go
internal/core/conversations.go
internal/core/embeddings_encoding.go
internal/core/embeddings_json.go
internal/core/errors.go
internal/core/json_fields.go
internal/core/json_fields_test.go
internal/core/message_json.go
internal/core/request_snapshot.go
internal/core/request_snapshot_test.go
internal/core/responses.go
internal/core/responses_json.go
internal/core/semantic_canonical.go
internal/core/types.go
internal/core/usage_json.go
internal/embedding/embedding.go
internal/gateway/batch_usage.go
internal/guardrails/batch_rewrite.go
internal/guardrails/batch_rewrite_test.go
internal/guardrails/definitions.go
internal/guardrails/executor.go
internal/guardrails/responses_message_apply.go
internal/guardrails/store_mongodb.go
internal/live/broker.go
internal/llmclient/client.go
internal/modeldata/fetcher.go
internal/modeloverrides/batch_preparer.go
internal/modeloverrides/store.go
internal/modeloverrides/store_postgresql.go
internal/modeloverrides/store_sqlite.go
internal/pricingoverrides/store.go
internal/pricingoverrides/store_postgresql.go
internal/pricingoverrides/store_sqlite.go
internal/providers/anthropic/anthropic.go
internal/providers/anthropic/batch.go
internal/providers/anthropic/chat.go
internal/providers/anthropic/chat_stream.go
internal/providers/anthropic/request_translation.go
internal/providers/anthropic/responses.go
internal/providers/anthropic/types.go
internal/providers/bailian/bailian.go
internal/providers/batch_results_file_adapter.go
internal/providers/bedrock/chat.go
internal/providers/bedrock/chat_stream.go
internal/providers/chat_stream_normalize.go
internal/providers/deepseek/deepseek.go
internal/providers/gemini/gemini.go
internal/providers/gemini/native.go
internal/providers/gemini/native_stream.go
internal/providers/googlecommon/auth.go
internal/providers/ollama/ollama.go
internal/providers/openai/openai.go
internal/providers/registry.go
internal/providers/registry_metadata.go
internal/providers/resolve_bench_test.go
internal/providers/responses_adapter.go
internal/providers/responses_content.go
internal/providers/responses_converter.go
internal/providers/responses_input.go
internal/providers/responses_output.go
internal/providers/responses_output_state.go
internal/providers/router.go
internal/providers/vertex/vertex.go
internal/providers/xai/xai.go
internal/providers/xiaomi/audio.go
internal/responsecache/responsecache.go
internal/responsecache/semantic.go
internal/responsecache/simple.go
internal/responsecache/sse_validation.go
internal/responsecache/stream_cache.go
internal/responsecache/stream_cache_chat.go
internal/responsecache/stream_cache_responses.go
internal/responsecache/vecstore_pinecone.go
internal/responsecache/vecstore_qdrant.go
internal/responsecache/vecstore_weaviate.go
internal/responsestore/store.go
internal/server/conversation_responses.go
internal/server/internal_chat_completion_executor.go
internal/server/native_conversation_service.go
internal/server/native_response_service.go
internal/server/request_selector_peek.go
internal/server/request_snapshot.go
internal/server/response_input_items.go
internal/server/translated_inference_service.go
internal/streaming/observed_sse_stream.go
internal/usage/audio.go
internal/usage/cost.go
internal/usage/extractor.go
internal/usage/reader_postgresql.go
internal/usage/reader_sqlite.go
internal/usage/realtime.go
internal/usage/recalculate_pricing.go
internal/usage/store_sqlite.go
internal/workflows/store_postgresql.go
internal/workflows/store_sqlite.go
internal/workflows/types.go
tests/perf/README.md
tests/perf/hotpath_test.go

coderabbitai · 2026-06-18T15:47:11Z

+covers the per-request resolution path. This routed path currently allocates an
+order of magnitude more per request because resolution re-copies the full model
+catalog several times; its guard ceilings should tighten significantly once
+resolution is computed once per request and reused.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

README performance explanation is outdated for the routed path.

Line 26-Line 29 still describe repeated full-catalog copies per request, but the routed perf-guard commentary now assumes O(1) selector-index behavior. This mismatch can mislead perf investigations.

Suggested fix

-`BenchmarkGatewayHotPathChatCompletionRouted` wires a real `Router` + -`ModelRegistry` (the production shape) with a representative catalog, so it -covers the per-request resolution path. This routed path currently allocates an -order of magnitude more per request because resolution re-copies the full model -catalog several times; its guard ceilings should tighten significantly once -resolution is computed once per request and reused. +`BenchmarkGatewayHotPathChatCompletionRouted` wires a real `Router` + +`ModelRegistry` (the production shape) with a representative catalog, so it +covers the per-request resolution path. With the selector-index fast path, +routed overhead should stay close to the bare-provider case and avoid +catalog-size-linear per-request copying for qualified-selector resolution.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

covers the per-request resolution path. This routed path currently allocates an

order of magnitude more per request because resolution re-copies the full model

catalog several times; its guard ceilings should tighten significantly once

resolution is computed once per request and reused.

`BenchmarkGatewayHotPathChatCompletionRouted` wires a real `Router` +

`ModelRegistry` (the production shape) with a representative catalog, so it

covers the per-request resolution path. With the selector-index fast path,

routed overhead should stay close to the bare-provider case and avoid

catalog-size-linear per-request copying for qualified-selector resolution.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/perf/README.md` around lines 26 - 29, The README.md file contains outdated performance documentation in the section describing the routed path (around lines 26-29). The current text still references repeated full-catalog copies per request, but the actual implementation now uses O(1) selector-index behavior where resolution is computed once per request and reused. Update the routed path performance explanation to accurately reflect this current behavior by removing or revising the outdated statement about order of magnitude allocations from repeated catalog copies, and instead describe how resolution is now computed once and reused, which provides the O(1) performance characteristics referenced in the perf-guard commentary.

CI (linux/amd64) and local (darwin/arm64) produce identical allocation counts and near-identical byte counts, confirming these are deterministic. Tighten the ceilings from "intentionally generous" to ~10% over the measured baseline so the guard catches smaller regressions while still absorbing Go/dependency drift: hot_path: 125 -> 120 allocs (baseline 113) routed: 160 -> 150 allocs, 18->16 KB (baseline 137 / ~14.7 KB) responses_stream: 310 -> 222 allocs, 25->22 KB (baseline 202 / ~19.6 KB) shared_observers: unchanged (already tight, no headroom to trim) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- registry: trim publicName/ProviderType when building the qualified-selector index and skip empty keys, matching the trimmed lookup inputs and the previous catalog scan (which compared trimmed fields on both sides). Prevents the O(1) fast path from missing when provider metadata carries whitespace padding. - resolve_bench_test: build exactly totalModels (round-robin across providers) instead of providersN*(n/6); the models=50/1000 cases previously benchmarked 48/996 models due to integer truncation. Add benchSelector helper. - request_snapshot_test: extend the owned-maps test to assert route/query/trace maps are owned (not cloned) while headers are still defensively cloned. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

SantiagoDePolonia and others added 2 commits June 18, 2026 18:52

SantiagoDePolonia merged commit 2677c1f into main Jun 18, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: faster model resolution, JSON decoding, and request snapshotting#413

perf: faster model resolution, JSON decoding, and request snapshotting#413
SantiagoDePolonia merged 3 commits into
mainfrom
perf/optimization

SantiagoDePolonia commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

greptile-apps Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov-commenter commented Jun 18, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-covers the per-request resolution path. This routed path currently allocates an
-order of magnitude more per request because resolution re-copies the full model
-catalog several times; its guard ceilings should tighten significantly once
-resolution is computed once per request and reused.
+`BenchmarkGatewayHotPathChatCompletionRouted` wires a real `Router` +
+`ModelRegistry` (the production shape) with a representative catalog, so it
+covers the per-request resolution path. With the selector-index fast path,
+routed overhead should stay close to the bare-provider case and avoid
+catalog-size-linear per-request copying for qualified-selector resolution.

Uh oh!

Conversation

SantiagoDePolonia commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. O(1) model resolution

2. JSON decoding → goccy/go-json

3. Request-snapshot allocations

4. Perf harness

Impact framing

Risks / follow-ups

Test plan

Summary by CodeRabbit

Uh oh!

greptile-apps Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov-commenter commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SantiagoDePolonia commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

2. JSON decoding → `goccy/go-json`

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

codecov-commenter commented Jun 18, 2026 •

edited

Loading